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Abstract. Polymerization of Ceo molecular crystal under high pressure and high temperature 
is simulated by using linear scaling tight binding molecular dynamics (TBMD) with Graphic 
Processing Unit (GPU) as a computational accelerator for matrix-matrix multiplication. Two 
sets of tight binding parameters were tested. 



1. Introduction 

High pressure transformation of Ceo molecular crystal has been attracting a great deal of 
attention in the past years because of its potential superconductivity and super-hardness. At 
ambient conditions, Ceo molecules crystallize into a face-centered-cubic (fee) structure with weak 
van der Waals interactions. Under extreme compression, the fee Ceo crystal may transform 
into polycrystalline diamond, diamond-like materials, or graphitic forms. Yamanaka et al. 
[1, 2, 3] have succeeded in synthesizing two dimensional and three dimensional Ceo polymers 
at moderate pressure and temperature by forming covalent bonds between Ceo molecules via 
hybridization change of sp^ bond to sp^ bond. Several polymorphs of 3-d Ceo have been 
theoretically proposed [3, 4, 5, 6, 7] and studied by using electronic structure calculation, 
geometry optimization and phonon calculation. However, the experimentally observed structure 
is not fully consistent with any of theoretically predicted models. Therefore molecular dynamics 
simulation of polymerization of Ceo is important to understand the crystal structure. In 
this paper we introduce a large scale tight binding molecular dynamics simulation of 3-d Ceo 
polymer formation with thousands of carbon atoms in the unit cell. Such large scale quantum 
simulation was achieved by the combination of linear scaling tight-binding molecular dynamics 
(TBMD) [8, 9, 10, 11, 12] and computational acceleration with graphic processing unit (GPU) 



[13, 14, 15, 16]. 



2. Linear scaling method 

The Hamiltonian of non-selfconsistent field tight binding method is written as 




(1) 



where i and j are the indices of atoms and a and /3 are the indices of orbitals of the atom while 
and c are the creation and annihilation operators of electron. 
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Figure 1. Krylov basis of order r with the seed vector |z = 0) for one dimensional uniform 
chain. 

The one-body density matrix of the Hamiltonian (1) is defined as 

p = ^\m)f{em){m\ (2) 

m 

where \m) are the eigenstates of (1) and /(e^) is the occupation number of the eigenstate \m). 
The density matrix contains ah information of the system state. For example, the force on the 
z-th atom can be calculated as (Fi) = Tr[pF^] by using the density matrix and the force matrix. 
The computational cost is 0{N^) due to the diagonalization of Hamiltonian where N is the total 
number of atoms. 

Linear scaling of the computational cost with respect to is achieved by exploiting the 
locality of the density matrix, or solving the local electronic properties of the target atom by 
taking into account the atoms within the cutoff radius from the target atom only [8, 9, 10, 11, 12]. 
Then the computational cost becomes 0{n^N) where n is the number of atoms within the cutoff 
radius from the target atom. 

3. Krylov subspace 

The Hilbert space of the Hamiltonian of the atoms within the cutoff radius from the target atom 
is expressed as 

span {|1, a) , |2, a) , |3, a) , • • • |z, a) • • • , |n, a)} (3) 

where n is the number of the atoms in the cutoff sphere and a is the index of the atomic orbital, 
which is hereafter omitted for simplicity. The Krylov subspace of order r with the seed vector 
10), Kr{H^ 10)), is generated by multiplying H repeatedly on the seed vector 10). 

KriH, \<P)) = span {!</.) , H \<j>) , \<P) , ■ ■ ■ , H'-' !</>)} • (4) 

Here we use the atomic orbital of the z-th atom as the seed vector, |0) = |z), to calculate the 
force on the z-th atom. This is quite different from the usual Lanczos diagonalization methods 
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Figure 2. Equations of state of the cubic diamond: (1) calculated data with Xu's parameter 
[17], (2) calculated data with Omata's parameter [18], (3) experimental data by Dewaele et al. 



where the random vector is used for the seed vector. The accuracy of the Krylov subspace as an 
approximant of the original Hilbert space increases rapidly as r increases up to m, where m < n 
is the degree of the minimal polynomial of H with respect to It was shown [8] that Krylov 
subspace of order r <^n can describe the system with sufficient accuracy. The efficiency of the 
Krylov subspace can be understood from the physical point of view. Since the tight binding 
Hamiltonian (1) contains the hopping terms Vij^ multiplication of H on \i) makes the electron 
hop from the z-th atom to the neighbor atoms linked by the hopping terms. Multiplying H 
twice on \i) makes the electron spread to the next nearest neighbors. Therefore the Krylov basis 
generated from the seed vector \i) is localized at the target atom and then gradually spreads 
out to nearby atoms. Let us examine this with the simplest example of one dimensional uniform 
chain whose Hamiltonian matrix elements (1) are all zero except ti^i±i = 1 for all i. Then the 
Krylov basis of order r can be expressed in terms of binomial coefficients. 



In Fig. 1 the Krylov basis in real space representation is shown. Krylov subspace of small order 
(r ^ 30) can represent quantum states localized around the target atom accurately because it 
samples the wave function around the atom with larger weight. 

Since Krylov basis (4) is not orthogonal, Lanczos algorithm is often used to construct 
orthogonal basis set from it. Lanczos algorithm transforms an Hermitian matrix H to a, 
tridiagonal real symmetric matrix T = HV where the transform matrix V = ('^i, '^2? • ' ' ? '^r) 
consists of orthogonal basis Vjn called Lanczos vector. The Lanczos vector Vj^ and the diagonal 
and off-diagonal matrix elements of T (ai and are calculated by the three term recursion 
formula: 



[19]. 



r 




(5) 



k=0 



H \vk) - au \vk) - Pk-i \vk-i) 



(6) 



\vk+i) = -r \lk) 
Pk 



(7) 



where — 0,i;i = |^),Q^/c — {'^k\H\vk)^ f^k — V {h\h) ^ind /3o = 0. Once T is obtained, the 
eigenvalues and eigenvectors of this small sized (r x r) symmetric tridiagonal matrix are easily 
calculated by using common linear algebra libraries such as LAPACK. 

In order to calculate all forces acting on the N atoms we have to repeat the above procedure 
for each atom. By rearranging the atoms into small groups consisting of m atoms {m ^ n ^ N) 
and calculating simultaneously the forces acting on the m atoms, we can rewrite the three term 
recursion formula (6) in the matrix-matrix multiplication form (Level 3 BLAS form) [20] , which 
is efficiently calculated with GPU. 

Lk = HVk - AkVk - Bk-iVk-i (8) 
14+1 = B^^Lk (9) 

where Lk = (l4^^)' l4^^)' " ' ' 14"^^))' = (^i^^ '^i^^ ' " ' '^i'^^)' is diagonal matrix 

[a^j^'' , a^^'' , • • • , a^^''], and Bj. is diagonal matrix [/^^"^^ , , • • • , . 

4. Molecular dynamics of Ceo 

The tight binding method was applied to the equation of state (EOS) of diamond and molecular 
dynamics of Ceo polymerization. For this purpose, two types of tight binding model parameters, 
Xu's parameter[17] and Omata's parameter[18] were used. Both parameter sets have the same 
functional forms, but Omata's parameter was constructed to reproduce the LDA energetics of 
not only the sp^ graphene and sp^ diamond covalent bonds but also that of the sp^ interlayer 
interaction. The cutoff radius of Xu's parameter is 2.6 A, while that of Omata's parameter is 
7.0 A so that it can include interaction between sp^ interlayers, which may be important for 
polymerization of Ceo- 

The EOS of 4x4x4 supercell of cubic diamond was calculated for Xu's and Omata's TB 
parameters (Fig. 2). Convergence was tested with respected to the number of atoms in the 
cutoff radius n, the order of Krylov subspace r and size of supercell (by using 8x8x8 supercell). 
The EOS of Xu's TB parameter reproduces the experimental result well while the Omata's 
parameter produces stiffer EOS. 

Fig. 3 shows the 2x2x2 fee structure of the Ceo at OK and OGPa as the initial structure. 
Fig. 4 shows the snapshot after 3 ps NPT simulation with Xu's parameter at 25 GPa and 500K, 
and Fig. 5 shows the snapshot after 3 ps NPT simulation with Omata's parameter at 25 GPa 
and 500K. Simulation with Omata's parameter successfully reproduced the polymerization of 
Ceo under high temperature and high pressure. However, the produced polymer was not perfect 
crystal. The finer control of temperature and pressure would be required to obtain better results. 

5. Acceleration by GPU 

Graphics Processing Unit (GPU) is an electronic device for accelerating graphic processing on 
personal computers and game machines. GPU can perform numerical calculation much faster 
than CPU in some types of computation such as matrix-matrix multiplication [13], fast Fourier 
transformation (FFT) [13], and the N-body problem [15]. Application of GPU computation is 
rapidly expanding to many fields of science including brain science, quantum science, astronomy 
and fluid dynamics [16]. In this paper, we apply GPU computation to tight binding molecular 
dynamics. According to the Amdal's law [21] the speedup of a program using GPU is limited by 
the time needed for the CPU fraction of the program. The breakdown of computational time of 
the original CPU version of ELSES [10] shows that the most time consuming part is the matrix- 
vector multiplication in the Lanczos algorithm which is 99.92 % of the total computation time in 



Figure 3. the 2x2x2 fee strueture of the Ceo at OGPa and OK. 




Figure 4. Snapshot after 3 ps NPT simulation with Xu's parameter at 25 GPa and 500K. 




Figure 5. Snapshot after 3 ps NPT simulation with Omata's parameter at 25 GPa and 
500K. 



our case. Therefore the calculation can be significantly accelerated by using fast matrix-vector 
operation. For this purpose, we adopted CUBLAS, a linear algebra hbrary on GPU [13]. The 
elapse time for calculating the pressure of 4x4x4 supercell of diamond with Omata's parameter 
was 109.53 s for GPU version with m = 4 and 504.3 s for the original CPU version of ELSES [10]. 
Therefore the GPU version is five times faster than the CPU version. The elapse time for Xu's 
parameter was 55.65 s for GPU version and 29.19 s for CPU version. The CPU calculation is 
efficient when the interaction is short ranged and H becomes sparse matrix, while GPU becomes 
very efficient when the interaction is long ranged and the Hamiltonian becomes dense matrix. 
We used Sun Fire X2250 eight core workstation for CPU computation and NVIDIA GeForce 
GTX8800 for GPU computation. 

In summary, formation process of Ceo polymer under high pressure and high temperature 
was simulated by the linear scale tight binding molecular dynamics with hardware acceleration 
by GPU. Two sets of tight binding parameters were tested. Xu's parameter has short ranged 
interaction resulting in fast simulation on CPU but does not faithfully reproduce polymerization. 
Omata's parameter has long ranged interaction and enjoy the acceleration by GPU. It reproduces 
sp^-sp^ hybridization change successfully but does not reproduce experimental EOS accurately. 
Further development of accurate tight binding parameters for Ceo polymer would be necessary. 
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